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Abstract The single-index model is one of the most popular semiparametric 
models in Econometrics. In this paper, we define a quantile regression single-index 
model, which includes the single-index structure for conditional mean and for con- 
ditional variance. 
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1 Introduction 

Regression quantiles, along with the dual methods of regression rank scores, can be considered 
one of the major statistical breakthroughs of the past decades. Its advantages over the other 
estimation methods have been well investigated. Regression quantile methods provide a much 
more complete statistical analysis of the stochastic relationships among variables; in addition, 
they are more robust against possible outliers or extreme values, and can be computed via 
traditional linear programming methods. Although median regression ideas go back to the 18th 
century and the work of Laplace, regression quantile methods were first introduced by Koenker 
and Bassett (1978). The linear regression quantile is very useful, but like linear regression it is 
not flexible enough to capture complicated relations. For quantile regression, this disadvantage 
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is even worse. As an example, consider the popular AR(1)-ARCH(1) model: 

y t = a + axyt-x + £*, St = °tz t , z t ~ IID 
o$ = 0o + Pi4-i> Po>0, Pi >0, 

which cannot be fitted well by the linear quantile model. 

In this paper, we focus on an important special case when the loss function is specified as 

p T ( v ) =tI(v > 0)v+ (r- l)I(y < 0)v, (1) 

where < r < 1 and /(.) is the identity function, leading to the rth quantile regression, see 
Koenker and Bassett (1978). 

In a nonparametric setting, we can state the problem as follows. Suppose Y is the response 
variable and X E R d are the covariates. For loss function p T (.), we are interested in a function 
m T (x), such that 

m T (x) = argmin E{p T [Y — m{X)\ X = x} with respect to m(.) E L\. (2) 

The function m T {x) is called the r— th quantile nonparametric regression function of Y on 
X. The application of nonparametric quantile estimation has been intensively investigated in 
the literature. See for example Koenker (2005) and Kong et al (2008). As in nonparametric 
estimation of the conditional mean function, there is the "curse of dimensionality" in estimating 
the typically multivariable function m T (.). The dimension reduction approach can thus be 
applied here, by considering 

m T (0 T x) = arg min E{p T (Y — m(9 T X))\X = x} with respect to 9 E and m(.) E Li, (3) 

where = {6 : \9\ = 1}. Ideally, we come to a single-index quantile model 

Y = m(filX) + e, E((p(e)\X) = 0, a.s. (4) 

where </?(.) is the piecewise derivative function of p(.) in ([1]). A typical model is the general 
single-index model, 

Y = g(9^X,e) 

where e is independent of X. Under such a model specification, it is easy to see that 
m T (x) = g T (9 T x) = min{v : P(g(9 x,e) < v) > r}. 



For the conditional heteroscadiscity model, where g(9^X,e) = g(9j X)e, we even have 

m T (x) = g(6lX)Q T {e) 

where Q T (^) is the r— th quantile of e. An interesting special case for this setting is the ARCH(p) 
model, where X = , ...,y^_ p ) T and Y = y t in a time series setting. 

Our main focus is the estimation of 9q. Suppose {Xi,Yi}™ =1 are I.I.D. observations from 
underlying model ©. We propose to estimate the index parameter 9q by 

71 71 

9 = argmin min V V K(9 J 'Xij/h)p{Yi - aj - b^Xy), X i5 =X t - X, (5) 
l= l j=l 

where if (.) is a kernel function and h is a bandwidth. The minimization in ([5]) can be realized 
through iteration. First for any initial estimate $ G O, denote by [a#(a;), the minimizer 

of 

n 

^2 K{'d r X ix /h)p{Yi - a- bif X ix ) with respect to a and b, (6) 
i=i 

where X{ x = Xi — x. The estimate of 9q is then updated by 

n ?i 

9 = argmin K{& Xij/h)p{Yi ~ a*(Xj) - b$(Xj)9 T Xij}. (7) 

ee6> i=i j=i 

Repeat © and ([7]) until convergence. The true value 9q is thus estimated by the standardized 
final estimate 9 := 9/\9\. 

2 Numerical studies 

Again, the calculation of the above minimization problem can be decomposed into two mini- 
mization problems. 

• Fixing 9 = $ and wfj = Kh(v X^), the estimation of aj and dj are 

71 

^piYi-aj-dj^Xijywfj. 

• Fixing aj and dj, the minimization with respect to 9 can be done as follows. Again, let 
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Then the problem becomes 

n 

nun £ p{Y% - 6 T Xf 3 } 
Suppose the solution to the above problem is 9. Standardize it to 9 := 

Set i? = 9 and repeat the two steps until convergence. Note that both steps are simple linear 
quantile regression problems and that several efficient algorithms are available, see Koenker 
(2005). 

Example 2.1 (Single-index median regression) Consider the following model 

y = exp{-5(fl T A) 2 } + e, (8) 

where X ~ Y^J 2 Xq with Xq ~ N(0,I^) and So = (0.5' i— J *')o<tj<5. For the noise term, we 
consider several distributions with both heavy tail and thin tails as well. For simplicity, we 
consider the median regression only. As a comparison, we also run the MAVE where a least 
square type estimation is used. With different sample sizes n = 100, 200, we carried out 100 
replications. The calculation results are listed in Table [TJ 



Table 1: Estimation errors (and standard errors) for model (j8]) based on quadratic loss function 
and 50% quantiles 



size 


method 


Distribution of e 


0.05t(l) 


0.1(A(0,1) 4 -3) 


>/5i(5)/20 


N(0,l)/4 


100 


MAVE 
qMAVE 


0.3641(0.3526) 
0.0902(0.1074) 


0.3530(0.3102) 
0.1512(0.1957) 


0.0401(0.0182) 
0.0833(0.0785) 


0.0581(0.0263) 
0.1146(0.0651) 


200 


MAVE 
qMAVE 


0.3381(0.3389) 
0.0681(0.1415) 


0.2859(0.2887) 
0.0581(0.0698) 


0.0232(0.0091) 
0.0402(0.0173) 


0.0373(0.0147) 
0.0652(0.0272) 



The MAVE method with quadratic loss function has very bad performance when the noise 
has heavy tail (e.g. t(l)) or is highly asymmetric (e.g. N(0, l) 4 ). With the absolute value loss 
function, the performance is much better. Even in the situation when the noise has thin tail 
and symmetric, qMAVE still performance reasonably well. 



3 Assumptions and asymptotic properties 

We adopt model throughout and make the additional assumption that {(Aj, li)}^ 1 are I.I.D. 
observations. The extension to the case of weakly dependent time series should be straightfor- 
ward but complicates matters without adding anything conceptually. Furthermore, the following 
conditions are assumed in the proofs of Theorem 16.11 
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(Al) For each v £ 1Z, p(v) is absolutely continuous, i.e., there is a function <£>(.) such that 
p(v) = p(0) + ip(t)dt. The probability density function of E{ is bounded and continuously 
differentiable. E{(p(ei)\Xi} = almost surely and E\ip{ei)\ Ul < Mq < oo for some v\ > 2. 

(A2) Function ip{.) satisfies the Lipschitz condition in (dj,aj+i), j = 0, • • • , m, where a\ < 
■ ■ ■ < a m are finite number of jump discontinuity points of </?(.), «o — — oo, a m+ \ = +oo 
and m < oo. 

(A3) Kernel function K{.) is symmetric density function with a compact support and satisfies 
\u j K(u) - v j K(v)\ < C\u - v\ for all j with < j < 3. 

(A4) The link function m{.) defined in ^ has continuous and bounded derivatives up to the 
third order. 

(A5) The smoothing parameter h is chosen such that nh A — > oo and nh 5 /logn < oo. 

Note that (Al) and (A2) are satisfied in quantile regression with p(.) = p T (.) given in ([T]). 
Condition (A3) and (A4) are standard in kernel smoothing. Based on (Al) and (A2), Hong 
(2003) proved that there is a constant C > 0, such that for all small t and all x, 

E\{<p(Y -t-a)-<p(Y - a)} 2 \X = x\ < C\t\ (9) 

holds for all (a,x) in a neighborhood of {m(x T 6q),x}. Define 

G(t; x) = E{p{Y - m(x T 6 ) + t}\ X = x}, G l (t, x) = (d i / dt^G^ x), i = 1,2, 3. (10) 

Then it follows that 

g (x) d = f G 2 (0;x) > C > 

and G^{t,x) is continuous and uniformly bounded for all x E T> and t near 0. For quantile 
regression, g(x) = / £ (0|x), where f £ {-\ x ) is the conditional probability density function of e 
given X = x. 

4 Initial estimator of 6>o 

We use the average derivative estimation (ADE, Hardle and Stocker, 1989; Chaudhuri et al., 
1997) method to obtain an initial estimate of 9$, by observing the fact that E[dm{6~Q X) / dX] = 
9aE[dm(9] ) X)/d(9] ) X)] and 

9 = E[dm(9] j X)/dX}/E[dm(9lX)/d(9] l X)}. (11) 
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For any x G R d and a kernel density function H(.) : R d — > R + , denote by [a(x),b(x)], the 
minimizer of the following quantity 

n 

Y,H(X tx /h )p(Y l -a- b T X lx ), 

i=l 

with respect to a and b. Observing (jlip . an initial estimate of 9q could be constructed as follows 

n n 

i? = Y,c(X J )b(X J )/\Y,c(X J )b(X J )\, (12) 

3=1 3=1 
where C(x) is some trimming function introduced to deal with boundary effects. 

The consistency of i? in (|12p can be proved using the results on the uniform Bahadur repre- 
sentation of b(x) over any compact subset T> of the support of X. Suppose H{.) is symmetric 
about in each coordinate direction and the conditions in Proposition 3.1 and Corollary 3.3 
in Kong et al (2007) are met, especially n/iQ +4 /logn < oo and n/ig/logn — > oo. Then with 
probability one, 

b(x) = m'(6 T x)6 + 1 - J^HiXiJhoMe^/ho + o{h^( l -^Y /4 } (13) 

uniformly in x G 2?, where {fg}(x) = f(x)g(x) with /(.) the density function of X and g{x) > 
some deterministic function. This in turn implies that with probability one, 
1 n 1 " 

-Y,< x i)K*i) = m\9lx)e + —^^< X ^f9}- 1 (X J )H(X lJ /hoMe i )X tJ /h 

'v -. Ti tin ■ ■ -. 

Using results in Masry (1996), we know that with probability 1, 

1 n X 
— -^HiXis/hoMsi)-^ = 0{(n^/logn)- 1 /2 } 

uniformly in x G "D, whence 

1 n X 

— ^ £ c(^){M- 1 (X i )F(X i ,// i0 )^( £i )-^ = Hn^/Iogn)" 1 /"} 

n i,i=l 

almost surely. Therefore, concerning the initial estimator •& in (|12p . we have 

6* = -# = O{h^(nh d / log n)" 1 / 2 } (14) 

almost surely. Consequently from now on, we focus on parametric space n = {i? : |##| < 
C/i(n/iQ +2 / log n)" 1 / 2 } for some constant C > 0. 



5 Asymptotics of a#(x) and b&(x) 

For any i? G n , denote by f§{x) and F$(x) ,the probability density function and distribution 
function of t9 T X at i? T x respectively, and for any u£ii and ifPc define 

m&(v) = argmin£'{p(y — a)|X T i9 = v}, 

a 

Go(t,x) = E{p(Y - m^x) + t)\$ T X = tf T x}, 

GUt,x) = (&/de)G6{t,x), i = l,2; = G$(m*(x),a:) 

Apparently gg {x) = g(x). We assume that for any # in a neighborhood of #o> G^(t,x) is 
continuous and uniformly bounded in the neighborhood of (m^(x),x) and there exists some 
5 > such that g&{x) > 5 for t? near enough 8q an d x £ T>. 

With initial estimate i?, let [a>j,bj] = [a#(Xj),b#(Xj)] be the solution to © with a; specified as 
Xj. If the smoothing parameter h is chosen such that nhj log n — ► oo and n/i 5 / log n < oo, using 
the results on uniform Bahadur representation in Kong et al (2007), we have 



aj-m.iXj) = ^{ 3 ./},- 1 ft)E4^Ki) + °{( ! ^) 3/4 }. ( 15 ) 

h{b 3 - m',(X 3 )} = ^{5J}^(^i)E^mP^A + 0{(^) 3/4 }, 

i=l 

uniformly in X,- G V, where #g = K(Xj^/h), Yj* = Yi-m^X^-m'^X^Xj^ and (.) = 

&?(.)/#(.). Note that m#(X 3 ) d = f m#(X]$) and m^X,) d = m^(Xjtf). 

Combined with Lemma 16.51 and Lemma 16.61 in the Appendix, further to (|15p , we have 

aj-a, = ±m\x]e MX j )h 2 + b j 6l{(v/iJ l MX j )-X j } 



+(nh)~ l {gf}~ 1 {X ] )Y J <Pij + 0{ (^) 3/4 + h A + h5^}, (16) 



bj-bj = h 2 



1 -m\X T j 9 ){{fn)'/{fg)UX j ) + (X] 6,){(f ^/(fg)}^) 



+b ] 5 1 d {{^' - //i/)/ M 2 }*(*i) + (nh 2 r l {gf}7 & l {X 3 ) £ ^ 



i=i 



+0 {h* + h 2 5« + (^y/h} 

uniformly in j with Xj G T>, where (y / fj,)#(Xj) = w#(Xj i9) / '//,?(Xj , 

= £[<?(X)|X T tf = t;], i/*(i;)=£[0(X)X|X T = t;]. (17) 



and (fij and (pij are zero-mean I.I.D. random variables defined as 

<Pij = KfjtpQ® - E[Kf jV (Y^)], (18) 

Note that (|16p focuses on the almost sure property of [aj,bj]. Welsh (1996) studied their the 
asymptotic bias and variance, i.e. 

E{a(x)} = m^x) + 0(h 2 ), E{b(x)} = m'^x) + 0(h 2 ), 

Var{a(x)} = 0(n -1 /r 3 ), Var{6(x)} = 0(n _1 /T 3 ), (19) 

and the 0(.)s are uniformly in x in any compact subset of the support of X. 

6 Asymptotics of 

For the previously obtained #, dj, bj, j = 1, ■ •• , n, suppose # minimizes <& n (6), where 

n n 2 7 

£ £ Kfjp(Yi - aj - bflXij) + \{9- 0)W(0 - 0). 
i=i j=i 

Apparently, also minimizes 

$ n (0) = $ n (0) + n 2 /i{i(# - 0„)W(0 - ) + (6 - 0)W(0 - 9 )} 

n n 
j=l j=l 

where Yij = Y{ — a,j — bjXjjOo. Let a n $ = max{(nloglogra) -1 / 2 , \5&\}. As |$ — 9q\ = 0(a n &), 
iM T = 6q9q + 0(a n $), whence for any 9 with 5g d = 9q — 9 = 0(a n &), we have 

* n (0) = * n (0) + n 2 M^^o^o^ - 6le e[S e } + o(n 2 ha 2 ni) ). 

Write * n (0) = £[*„(0)] + <^{i2ni(0) - ER nl (9)} + # n2 (0) - £# n2 (fl), where 

i,j i,3 

Applying the results on E(<& n (0)) in Lemma 16.114 we have 

$ n {9) = 5 T e R nl + hF g G n o5 e {\ + o(l)} + R n2 (9) - ER n2 (9), (21) 



where 

G n # = Y,E[Kf j g(X i )b)X ij X T ij ) = n 2 hS 2 {\ + 0{8#)}, 

1,3 

S 2 = J { m '(X T e )} 2 u eo (X)fe (X)dX, 
and lo${x) = E{g#(X)(X - x)(X - x) T \X T $ = x T $}. Consequently, 

$„(0) = (Rni - e e T Q )5 e + ^ftiGrrt + n 2 h6 6 T )5 e {l + o(l)} + R n 2{6) - ER n2 (9). 
Our main result is as follows 
Theorem 6.1 Suppose (Al)-(A4) hold. With v$(.) and //#(.) as defined in (11), we have 
0-0 O = (S 2 + o e T Q r 1 ^<p(e l )b i {TZjf} eo (X i ) 

i 

~(S 2 + Mo)" 1 ^ + Mo + <*nW ~ Go\ + o{n~ l l 2 ) 

= (S 2 + o eir l -Y^i P {e l )b i {mf} eo (X i ) 

~(S 2 + O T O )" 1 (Oo + Mo)** + «n|tf - flol + o(n' 1/2 ) (22) 
almost surely, where ~cjq(x) = E(X\X T 9 = x T 0) — x, a n = o(l) uniformly in i? and 

j 

n = E[{m'(X T 8 o )} 2 ne o (X){(v/ri 0o (X) - X}{(v/») 6o (X) - X} T ] 

Remark 6.2 In Lemma 16,161 we prove that if 5$ ^ 0, 

< | (5*2 + o T o rH^o + #o#o)<M/N < 1- (23) 
This implies that the effect on 9 — 9q of the initial estimate error i? — 9q decreases geometrically. 

Remark 6.3 Theorem 16.11 is proved under the assumption that {(Xj, Yj)}?^ are I.I.D. obser- 
vations. It is possible, however, to extend this result for time series observations provided that 
the time dependency (usually measured by mixing coefficient) are weak enough. For example, 
the stationary (3— mixing processes, which satisfies 



(3{k)= sup \P(B) -P{B\A)\ -»■ 0, asfc^cx), 
where T h a is the it— algebra generated by {(X{, Yi)}\ =a . 



Lemma 6.4 Under conditions in Theorem \6.1l we have 

(n 2 ^- 1 ^! = £ £ ¥>(£i)&i{w/}flo (*«) " + «nl^ " *o| + o(n" 1/2 ) . s . ( 2 4) 

Proof of Theorem 16.11 Based on (|24p . it suffices to prove that 

9 - 9 = {n 2 h(S 2 + 9 9l)}- 1 (R nl " n 2 he Q 6l5v) a.s. (25) 
As the first step to prove (|25|) . we show in Lemma 16.131 and Lemma 16. 141 that for each fixed 0, 

(n 2 ha 2 n ^)- 1 [R n2 {9) - ^n 2 (0)] = o(l) a.s. (26) 
This together with (j2"T|) and the fact that = n 2 /iS < 2{l + 0(5$)} imply that for any fixed 0, 
(n 2 /^)- 1 [$«(#) " 4(«m + Q T M - \n 2 h5 T e (S 2 + 9 Q 9 T Q )5 e ] - a.s. 

As both 5> n (0) — Sg(Rni + 0o0o^) an< i $0(82 + 0o0^)^6» are convex in 0, it follows from Lemma 
16.71 that for any compact set n # C n (convex open set), 

sup (n 2 / l a 2 ^)" 1 |l> n (0) - 5 T e (R nl + O 0^) - \n 2 h5 T e (S 2 + O 0^| a.s. (27) 

Let r/ n = {n 2 /i(52 + Oo@o)}~ 1 {Rni + 9q9q5$). Now we are ready to prove the equivalent of (|25|) . 
i.e. with probability 1, for any 5 > 0, |0 — 0o — ri n \/a n $ < 5 for large n. 

First note that as 0o + r\ n is bounded with probability 1, n can be chosen to contain B n , a 
closed ball with center 0o + r\ n and radius a n $5. Replace Q n g in ([2?]) by B n , we have 

A n = sup (n 2 na 2 ^)- 1 |3. n (0) - 5 T e (R nl - 9 9^) - \n 2 h5~l(S 2 + 9 9 T )5 e \ = o(l) a.s. (28) 

Now consider the behavior of $ n (0) outside B^. Suppose = 0o + rj n + a n $(3v, for some (3 > 5 
and v a unit vector. Define 0* as the boundary point of B^ that lies on the line segment from 
00 + Vn to 0, i.e. 9* = 0o + rj n + a n $5u. Convexity of 3> n (0) and the definition of A n imply 

^l> n (0) + (l-^)$ n (0 O + 7 ? „) > $„(0*) 

> ^n 2 h5 2 a 2 n y(S 2 + 9 9 T )u 
-i(n 2 / l )" 1 ^ 1 (5 2 + 9 $l)- 1 R nl - n 2 ha 2 n ^ n 

> Ki 2 h5 2 a 2 n ^(S 2 + O 0^ + $ n (9 + ??n) - 2n 2 ha 2 n #A n . 
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It follows that 



/3 2, 2 A X 2,J, 



inf $ n (0) > $ n (9o + Vn) + '-n z ha^[-5 2 v' (S 2 + Q ] Q )v - 2A n ]. 

\9—9q— rin\>Sa n # I 

As Si + ^o^o * s positive definite, then according to ([25]) . with probability 1, <5 2 i/ T 52^ > 4A n for 
large enough n. This implies that for any 5 > and for large enough n, the minimum of $> n (0) 
must occur within B^. This implies (|25D . ■ 



Appendix 

Proof of Lemma 16.41 Write 

R nl {9) = J2 Ktjipie^bjXij + - b^j + £ RffyX^Y^ - cp( £i )}, 

where Ej denotes expectation taken w.r.t Xj for given Xi. We will show that 

J^J2 K ?M £ i) b j X V = ^X>( £ *M OT /}0oPQ) + 0{{\og\ogn/n) l l 2 {h 2 + 6#)}, (29) 

i,j i 

which together with Lemma 16.121 lead to (|24l) . 
First note that 

EjiidjbjXij/h] = hiwfUiXi) - <WW#o){£/W^) 



'J 



+h 2 b i {wf}l(X l )+0(\5#\ 2 + h A ) 



This together with Lemma 7.8 in Xia and Tong (2006), we have 

-^Ys K iM £ i) b i X ij = l^^ b ^ m f^ a (X i ) + 0{(loglogn/n) 1 / 2 (h 2 + 5^}, 

from which follows (i29l) . as {mf}$(.) is lipschitz continuous in 1?. ■ 
Lemma 6.5 

me(Xj)- aj = bjSliiy/uMXA-Xjy + oQSt]), (30) 

m'^X^-bj = b^iiv' -n'v)/li 2 MX j ) + o{\5#\), (31) 

Proof It follows from the property of conditional expectation that 

E{p(Y -a)\X T $ = x T tf} = E[E{p(Y -a)\X}\X T $ = x T $] 

= E[G{m(p T Q X) - a; X}\X T $ = x T i9]. 

11 



Using the differentiability of G(t; X) in t, we have 

G{m{6lX) -a;X} = G(0; X) + g(X)(m(6] ) X) - a) 2 /2 + 0{(m(^X) - a) 3 }. 

If X T i? = x T i? and <fo = o(l), m{0lX) - m(0jx) = 0{0^{X - x)} = 0{5 T d (X - x)} = o(l). 
Therefore for every a near miO^X) (whence to((9qx) [WHY] ), 

E[G{m(plX) - a;X}\X T $ = x T $] - E[G(0; X)\X T $ = x 1 '■&} 

- ±E[g(X)(m(9 r X) - a) 2 \X T {> = x T $]. 

As p(.) is convex, we can argue this convergence is in fact uniform over all a near 771(6^ X), which 
implies that the minima of E[G{m(9^X) — a; X}\X T, & = x T #] is also approximately the minima 
of E[g(X)(m(6] ) X) - a) 2 \X T $ = x T tf]. We have 

m(6~lX) = m(0~lx) + m'(9^x)9l(X - x) + C{9^(X - x)} 2 , 
E[g(X){m{6lX) - a) 2 \X T $ = x T tf] = 2m'(^x){m(flji) - a}4{^(x T t?) - x^(x T $)} 

+{m{6 T x) - a} 2 ^(x T $) + 0(\5#\ 2 ). (32) 

Take derivative with respect to a and (13QH follows. 
To prove (|3ip . for any t — > 0, mimicking (|32p . 

E\g(X){m(0~lX) - a} 2 \X T $ = x T $ + t] 
= 2m'(elx){m(elx) - a}E[g(X){t + S^(X - x)}\X T $ = x T $ + t] 

+{a - m(6 T Q x)} 2 ^(x T $ + t) + 0(\5#\ 2 ) 
= {a- m{6lx)} 2 ^{x 1 '■& + t) + 2tm' {6lx){m{6lx) - a}^(x T ^ + t) + 0(t%| 2 ) 

+2m'{6 T Q x){m{6 T x) - a}5~l{^(x T '& + t) - x^(x T ^ + t)}. 

Again take derivative with respect to a and by the definition of m#(.), we have 

m^x + t) « m(0Qx) + tm'(0~Qx) + m'(B] ) x)5^{(v/[i)i}(x T $ + t) - x}, 

Recall that from ([H, m^ T x) « m(6ftx) + m'(e~lx)8^{{v/ n)# (x T $) - x} + 0(|<M 2 ). Subtract 
this from the equation above and suppose the first order derivative of and are both 
Lipschitz continuous, we have 

m^(?? T x + t) — m#($ T :c) 
« tm'(^x) + m'(elx)6j{(u//j)4 (x T $ + t) - (z//^)^(x T i?)} 
= trn'(9lx) + im'(fljx)fi5{(/ii/ - ^V)/u 2 },?(x T i?) + 0(t 2 ). 

12 



Divide this over t and let t — > 0, we will have (|3ip . ■ 
Lemma 6.6 EKf^Y^) = \m (X]9 )(fgMX 3 )h 3 + 0(/i 4 ) + o(h5t), 

E i Kf ji p(Y* j )Xj j 'd = h^{^m"(X]e )(fMX J ) 

+ l -m^\x]6 Q ){fMX j )} + 0(h% + h 6 ). (33) 

Proof Based on (|3U|) and (|31|) . we have 

m(X]9 ) - m*{Xj) - m'^X^Xj^ 
= m(Xj6 ) - m(X]6 ) - bj%{{v/ p^Xj) - X 3 } 
-{bj + b&ifri/ - fi'^/^UXj^Xj^ + o(\6#\) 

= bjXjfo + ^m'ixJeoMXijf + l -m^{x]e G ){elx i3 f 

-b&ibu/ - p'u)/p 2 }^X j )Xl'd - b&iiv/nWXj) - X,} 
+o(\5 l3 \) + 0{(Xj j $) i + 5v}. 

As m(Xj9o) — mtf(Xj) — m'^X^Xj-d = o(l), by the continuity of G\(t;X) in t, we have 

E[<p{Yi - m^Xj) - m'^X^Xj^lXi] 
= dimiXjOo) - m^Xj) - m'^X^Xj^; X { ] 

= bjdJgiX^Xij - b&{{v/ii)t{Xj) - X j }g{X l ) - b^{^v' - p'u)/ ^{X^X^Xj^ 
+±m"(X]e )g(X l )(e T X lJ ) 2 + ^m^(X]e )g(X l )(9 T X lJ f + o(\8#\) + 0((Xj^) 4 ), (34) 

and thus 

Ei[Kf j( p{Yi - mviXj) - m't{Xj)Xlj&}\ = ^m" \x] )(gfMXj)h 3 + o(h\8 d \) + Q(h 4 ). 
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Similarly ()33[) follows from (|34p and the following facts 

E\g(Xi)Xij\Xj# = X]d + hu] = Uj(X]# + hu) - XjH#(X]0 + hu) 

= + hui/^Xjti) - Xj^iXjtf) 

-huX,ii'#(X]0) + 0(h 2 ), 
E\g(Xi)\Xjti = X]tf + hu] = ^{X T j d) + hu^{X T j §)+0{h 2 ), 
f K(u)E[g(X i )X tJ \X]$ = X]$ + hu]hudu = h 2 {(fu')#(X]#) - ^(///^(Xjtf)} 

+h 2 {(fvMX]#) - Xjif'^X]^)} + 0(/i 4 ), 
j K(u)E[g(Xi)\Xj$ = X]$ + hu]hudu = h 2 {(i'f + [if%(X]$) + 0(/i 4 ), 

J K(u)E[g(Xi)\XJtf = X]i) + hu]h 2 u 2 du = h 2 (fif)# (Xj~tf) + 0(h A ). U 

Lemma 6.7 Let {X n (9) : 9 G 0} be a sequence of random convex functions defined on a convex, 
open subset of R d . Suppose X(9) is a real valued function on such that X n {9) tends to X(9) 
for each 9 almost surely, Then for each compact set K o/0, with probability 1, 

sup|A n (0)-A(0)| ^0. 

Proof The condition can be restated as follows: for any fixed 9 G 0, there exists some D,g C 17, 
such that P(p,$) = 1 and 

X n (uj,9) — X(9) — » 0, for any cj G fV 

The conclusion can be restated that for each compact set K of 0, there exists some fio f= 
such that 

P(Oo) = 1 and sup \X n (co, 9) — X{9)\ — > 0, for any u> G ^o- 

For such uniformity of the convergence, it is enough to consider the case where K is a cube with 
edges parallel to the coordinate directions e\, ■ ■ ■ , e^. Every compact subset of can be covered 
by finitely many such cubes. Let = K and K +s ° be the larger cube constructed by adding 
an extra layer of cubes with sides <5o to K. Suppose 5q > is small enough such that K +So C 0. 
Define Uo for the finite set of all vertices of all the cubes that make up K +s . 

Now for fe = 1, 2, • ■ ■ , let ejfc = k~ 1 . As convexity implies continuity, there is a < S k < S k ~ l 
such that A(.) varies by less than e^/{d+l) over each cube of side 35 k that intersects K. Partition 
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each cube in 3?fe_i into a union of cubes with side at most 5 k and denote by 9fe the resulted union 
of cubes. Then expand K to a larger cube K +Sk by adding an extra layer of these 5 k — cubes 
around each face. As 5 k < 5 k ~ 1 , K +&k C K +&k 1 is still within 6. Define 

= { vertices of all the 5 k — cubes that make up K +s } I5k~i 

= { vertices of all the 5 k - cubes that make up K +&k } (J{l5 fc _i f] K°} 

and 

fife = Pi q,0. 

As 13k is finite, we have -P(fifc) = 1 and 

for any u £ fi fe , Af*(w) = sup |A n (w, 9) - \{0)\ -► 0. (35) 

We first establish the connection between M k (uj) and the upper bound for X n (uj, 9) — X(9), over 
9 £ K, for any given u> £ fife. 

For any fixed fe = 1, 2, • • • , each 9 in K lies within a <5 fc -cube with vertices € 15 it can be 
written as a convex combination ^ ■ a^j of those vertices, i.e. 

0;€O\ 6>*eo fc 
Then for any given a; G fife, convexity of X n (u>,9) in gives 

A n (w,6>) < ai\ n (oj,9i) 
6ieu k 

= ai {X n (u, Ot) - X(9 t )} + £ Oi{A(0i) - A(0)} + A(0) 

9ieu k e t eu k 

< M k (uj) + max \X(9i)-X(9)\ + \(9). 



e,eu k 



Therefore, 



X n (u;,9)-X(9)<M k (u J ) + € k . (36) 

Next we establish the companion lower bound. For any fixed k = 1, • • • , each 9 in K lies within 
a <5 fc -cube with a vertex 9q in K f]Uk- 

d 

9 = 9 + ^2 ^ e *' witn 1^1 - ^ « = !;••• > 



i=l 
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Without loss of generality, suppose <5j > for each i = 1, ■ • • , d. Define 

@ik = do — ^i^i, where 8^ = min{c > 5 k ■ 0o — ce% S U k }, i = 1, ■ ■ ■ , d 

Note that as 0o £ Kf) ^k, S[ must exist and 5[ < 25 k , for all i = 1, • • • , d. 
Write #o as a convex combination of and these Oik'- 

Denote these convex weights by and {A}- As <5j < 5 k < dp we have /? > l/(d + 1) and 
(3X n (u, 0) > Xn((jj, 9 Q ) - ^2 0iX n (u, ik ) ( convexity of A n (w, 0) in 0) 

i 

> \{0 o )-J2PiH0ik)-2M*(uj) (from ([35])) 

i 

> X(0) - e k /(d + 1) - &[A(0) + e fc /(d + 1)] - 2M n fc ( W ) 

i 

= /3A(0)-2e fc /(d + l)-2M*(u/) 

where the third inequality is due to the definition of 5 k and the fact that there exists a cube of 
side 35 k which contains both 0i k and 0q. As j3 > l/(d + 1), 

A n (cj, 0) - X{0) > -2e k - 2(d + l)M k (u). 

This together with ([36|) implies that for any = 1,2, • • • , there exists some fifc(2 ^fe+i) such 
that P(O fc ) = 1 and 

e n fc , sup |An(w, 0) - X{0)\ <{d + l)M k (u) + 2k- 1 . 
eeK 

Let fio = HfcLi ^fc* As ^fe is a decreasing sequence and P(Ofe) = 1, we have P(f2o) = 1 an d for 
any u £ Qq, 

sup \X n (u, 0) - X{0)\ <{d+ l)M k (u) + 2k' 1 , for all k > 1. (37) 
Note that as n — > oo, M k (u) — > for each fixed fc, as in (|35p . Take limit of both sides of (|37[) 

lim sup |A n (w,0) - A(6>)| < lim M k (u) + AT 1 = A:" 1 , for all k > 1. 
This is equivalent to that with probability 1, limn^oo sup |A n (u;, 6*) — X(9)\ — ► 0. ■ 

We now list a number of facts in the literature that will be used in our proofs later. 
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Lemma 6.8 [Korolyuk et al, 1989] Let X\, X 2 , ■ ■ ■ ,X n be i.i.d. random variables. With a 
symmetric kernel <1> : X m — > R, we consider the U-statistic 



l<i 1 <-<im<n 

Let = E$>(Xi, ■ ■ ■ , X m ) < 00 and for c = 0, 1, • • • , m, define 



$ c (xi,--- ,x c ) =E(§(X lr -- ,X m )\X! =xi,--- ,X c = x c ), ,$o = 9, $m = <S> 

c 

g c { Xl ,--- ,x c ) = Y^{-tY~ d Yl Yl $d(xji,--- ,x jd ), aj = Egj(X 1 ) 

d=0 l<ji<-<Jd<c 

Suppose af > and for all c = 1, • • • ,m, Eg% 2c ^ < 00. The with probability 1, 

lim ^ ( 2? nVflo g logn)i/ 2 =1 " 

Lemma 6.9 [Berbee's Lemma] Let (X, Y") be a i? d x —valued random vector. Then there 
exists a R d> — valued random vector Y* which has the same distribution as Y and 

Y* is independent of X; P(Y* / Y) = 0(a(X),a(Y)) (38) 

where cr(X) and c(Y)) are the a— algebra generated by X and Y respectively, and 

P[a(X),a(Y)]=E sup \P{A) - P{A\a{X))\ 

Lemma 6.10 p[a(X 1 ,Y 1 ),a(a j ,b j )} = 0{(nh/ log 3 n)" 1 ^} 
Proof By the definition, 

0[a(X 1 ,Y 1 ),a(a j ,b j )]=E sup _ \P{A) - P(A\*(X U Yi))\ 

Ada{lij ,bj) 

According to results in Welsh (1996), [(aj — Eaj) / '01, (6j — Ebj)/o2[ are asymptotically 
normal, where a\ = {Vara,-} 1 / 2 = 0{{nh)~ 1 / 2 } and a 2 = {VarS,} 1 / 2 = O^n/i 3 )" 1 / 2 }. Let 
r n = (n/i/ log n) -3 / 4 and rewrite (fT6|) as 



i=2 



1 n 1 

b i = Eb i + + Tp^y + Ofa/h}- (39) 

j=2 
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Note that <pij, fiij, i = 1, ■ • • , n are two sequences of zero- mean i.i.d. bounded random variables 
denned in (fTHj) , whence 

P{&j < h,bj < t 2 \Y 1 ,X 1 } < P[aj < Cr n + h, bj < Cr n /h + t 2 ] 



< P (aj - Eaj)/ai < (h - Eaj + Cr n )/ai 
(bj - Ebj)/<T 2 < (*2 - Ebj + Cr n /h)/a 2 



= P[aj < h, bj < t 2 ] + C{nh) 1/2 r n , 
P{cij > h,bj > t 2 \Y u Xi} > P[aj >h- CT n , bj >t 2 - Cr n /h] 



> P 



a ; 



Eaj)/oi > (ti - Ehj - Cr n )/ai, 



(bj - Ebj)/a 2 > (t 2 - Ebj - Cr n /h)/a 2 
= P[a,j > h,bj > t 2 ) - C(nh) 1/2 T n . 

Therefore, 

\P{aj < h,bj < t 2 \Yi,Xi} - P{aj < h,bj < t 2 }\ < C(nh)~ 1/2 T n = 0{(nh/ log 3 n)" 1/4 }. 

Lemma 6.11 Under the assumptions (Al)-(A5), we have 

E<5> n (6) = 5 T ER nl (6) + S T 9 G n ^g + o(n 2 h\5 e \ 2 ). 

Proof Apparently it suffices to show that 

EKf j {p(Y 1 - a 3 - b j e T X lj ) - p(Y 1 - hj - bjO[X X j)} 
= 6%E\Ki 3 v(Y x - hj - bjOlX^bjX^] + SjEiKfjX^Xj^X^e + o(\5 e \ 2 ). 

By the continuity of E[p(Y\ — a,j — tbj)\X] in i, where X = cr(Xx, • • • , X n ), we have 

E{p(Y 1 - aj - b j e T X lj ) - p(Y 1 - aj - bjOlX^X} 



fiXvEWYx - aj - bjelX l3 )bj\X] + SjX^XjjSedlE&iY! - dj - b 3 t)b 3 \X}]/dt\ 



t=xl 



+5lXijX[j( 



d{E{if{Yi - aj - bjt)bj\X}]/dt\ t=X T, eo - d[E{<p{Y x - a 3 - bjt)bj\X}]/dt\ 
where t* is some value between T X±j and OqXij. Taking expectations of both sides, we have 



t=f 



EKf^piYt - aj - b j 9 T X lj ) - p(Y 1 - a, - S^Xy)} 
= b" e E\Ktjv(Y x - hj - b i (\ [ X\j)b l X\j + 4 (Ax + A 2 )5 e 
A x = S{KgX li xT^[E{^(y 1 -a i -S^)S j |^}]/^| t=X T, } 
A 2 = i^J^^ 



(40) 
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where t* is some value between 9 X\j and 9 X±j . 

To study Ax, we need to compute d[E{(p(Yi — dj — bjt)bj\X}]/dt. To this end, we ap- 
ply Lemma 16.91 and Lemma 16.101 Suppose [ctj,&j] has the same distribution as [dj,bj], but is 
independent of and P([aj,bj] / [dj,bj]) = 0{{nh/ log 3 n)" 1 / 4 }. Thus for any 6 -> 0, 

- % - bj(t + 5))6j|AT] - E[ip(Y 1 - dj - bjt)bj\X] 
= E[ip{Yx - dj - bj(t + 5)}bj] - E[ip(Yi - dj - bjt)bj\X] 

+e\{v{y 1 - dj - bj{t + 5)) - <p(Y 1 - dj - bjt)}bji{[dj,b 3 } + [dj, bj]}\x] 

-EiMY, - dj - ~bj(t + 5)) - ^(Yi - d 3 - bjt)}~bjl{[dj,bj] + [dj, bj]}\X] 
= T! + T 2 + T 3 (41) 

Based on the definition of Gi(s;X), since Y\ is independent of [dj,bj], we have 

Ti = E^dim-dj-bjit + S^X^-Giiai-dj-bj^X^jbjlX] 

= 5E[G 2 ( ai -dj -bj^X^X] +o(S), (42) 

where the last equality follows from the continuity of G\ (t; X) in t. 

Next, we show that T 2 = o(5). As we mentioned in the proof of Lemma 16.101 [t>i,t>2] = 
[(dj — Edj)ja\, (bj — Ebj)/a 2 ] are asymptotically normal, where 

cti = {Yaidj} 1 ' 2 = 0{{nh)~ 1 / 2 }, a 2 = {Varft,} 1 / 2 = 0{{nh 3 )~ 1 / 2 } . 

Similarly construct [^1,^2] from dj and bj. Without loss of generality, consider a small <5(> 0). 
It is easy to understand that the conditional probability density function of Y\ given [^1,^2] is 
uniformly bounded. Therefore, for any given values of dj and bj (equivalently v\ and v 2 ), 

\E{tp(Yi - dj - bj(t + 8)) - <p{Yi - dj - bjt)\v u v 2 }\ < C5\bj\. 

Let f(vi, v 2 \vi, v 2 ) be the conditional probability density function of (v±,v 2 ) given (v\,v 2 ), and 

g(vi,v 2 )= J f{vi,v 2 \vx,v 2 )dv 1 dv 2 . 

[vi,V2]^[vi,v 2 ] 

As J f(vi,v 2 )g(vi,v 2 )dvidv 2 = P([dj,bj] / [dj,bj]) = 0{(nh/ log 3 n) -1 / 4 }, we have 
|T 2 | <C5 \b j \f(vi,v 2 )g(v 1 ,v 2 )dt 1 dt 2 = o(5). 
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Similarly we can show that T3 = 0(6). This together with (|4ip and (|42|) yields 

d[E(p(Yi-aj- bjt)bj \X]/dt = £?[G 2 (ai — dj — bjt; Xx)b 2 \X]. (43) 

Apply this result to Ai and A 2 , we have 

Ax = £?[iirgXyX]" i G 2 (oi - «i - bjXjjOo; X 1 )b% A 2 = 0(5 e ). 

Plugging this into (|4T)|) leads to 

- a, - Sj0 T Xy) - p(Yi - a,- - Sj^Xy)} 
= ^^[^^(Fi - ttj - / ); .v:/o)/' ; .V:,; + ^^[ifgX li x] J -G 2 (a 1 - ~a 3 - bjXjfa Xx)b 2 ]8g + o{\S e \' 
= 6 T e E[Kfjip(Yx - aj - yiX^bjXij] + SjElKfjXxjXjjgiX^Se + o(\5 e \ 2 ) 

where the last equality follows from the continuity of G 2 (i; X\) in t and (|19p. ■ 

Lemma 6.12 Define = KfjbjXij{ip(Yij) — ip(ei)}. Then 

h^EiZij = -8~lb){{v/^{X 3 ) - XjUMXj) - X j/ i#(X j )} T + o(\6#\ + n- 1 ' 2 ), (44) 

- EiZij) = o(n 2 h8#), (45) 

m 

(nh)- 1 £ Kfj^ih - bj)Xij = oin" 1 / 2 ) + 0{5„ {nh/ log n)" 1 ^} (46) 

i 

uniformly in 

Proof Once again we apply Lemma [6791 and suppose [dj, bj] has the same distribution as [dj, b j] 
and is independent of {Xx,Yx). By Lemma E033 P([5j,6j] / = 0{(n/i/ log 3 n)" 1 / 4 }. 

Recall X = a(X x , ■ ■ ■ ,X n ). Note that E x Z X j = E[KxjXxj(Tx - T 2 + T 3 )], where 

- a,- - Xj^obj) - ip(ex)}bj\X] =T 1 -T 2 + T 3 , 
Ti = E[{p{Y x ~ aj - bjXjje ) - ip(ex)}bj\X] 
T 2 = E[{<p(Yx - aj - bjXjjOo) - <p(ex)}bjl{[dj,bj] ± [&j,bj]}\X] 
T 3 = E[MYx - aj - bjXlOo) - ¥»(e 1 )}S i /{[a,-,6 i ] + [dj,bj]}\X]. 

Similar to (1421). we can conclude that 



Ti = E[{Gx(ax-aj-bjX[je ;Xx)-Gx(0;Xx)}bj\X] (47) 
= g(Xx)E{bj(ax - dj - bjXjje )\X} + 0[E{(ax - dj - IjXljO^X}]. 
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Using the results on the asymptotic bias and variance of (dj, bj) in (|19p . we can see that 

E{K?j( ai - dj - bjXj^o) 2 } = 0(hSj + n" 1 ), 
Next we deal with the first term in (|47p . Using (jl6[) . 

a\ — dj — bjXjjOo = a\ — aj + a 7 - — dj — bjX 



j — UjJ\.ijUQ — U\ — Uj -|- Uj — Uj — Ujy\ijUQ 

l -m"{X]6 Q ){{X[ e,?} - l -m"{X]6 )h 2 + O{(Xl6 f} 

b&Uu/pMXj) - x 3 } - bjS^itiv' - ^'y)/^(x j )x~l j e Q 

h 2 [lm"(X]e ){(f^'/(fg)UX J ) + lmW(X]e )(fnMX 3 )]Xl6 



n -. n 

i=l i=l 

+0{ (nh/ log n)~ 3 / 4 (l + 5#/h) + h 3 } (48) 

where , (pij are zero-mean IID random variables 

E[Kf j X lj T 1 ] = E[Kf J g(X 1 )X lj b 1 (a 1 -d j -b J Xl j 9 )] + o(h\5^\+n~ 1 / 2 h) (49) 
= -hS^du/MXj) - Xj}{v^Xj) - Xj^(Xj)} + o(h\6#\ + hn- 1 / 2 ) 



uniformly in i9, where (|19p is used in the last step. 

As P([oj,Sj] / [aj,bj]) = 0{(nh/log 3 n)" 1 / 4 }, we have similar to T 2 in (JUJ), 

^[iTj.ZyTa] = ofa" 1 /^) + o(h6#), E^XyTa] = o(n- l ' 2 h) + o(M*) 

uniformly in This together with (|49j) yields (|44p . 
To prove (|45j) . first note that 



tp(Yi - dj - bj0 o Xij) - <p(ei) = [<p(Yi - dj - bj0 o Xij) - <p(Yi - aj - bj9 Xij)] 

+[ip(Yi- aj - bjOlXij) - ip{ei)}. 

Let Zij = KfjXij{(p(Yi — aj — bjO^Xij) — (p(ei)}. By Lemma MUM it suffices to show that 

E bj(Zij - EZij) = o(n 2 h8$) (50) 

E & " 6 i) E 4" = o(n 2 M^). (51) 

j i 

Due to Borel-Cantelli Lemma, (|50p can be further reduced to, for any e > 



nP{ \ bj(Zij — EZij)\ > enM,?} is summable over n, (52) 
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which follows from the facts that Zjj is bounded, EZfj = 0(/i 3 + M|) and Bernstein's inequality, 



P{\ £(Z« - EZ it )\ > < Cexp { - ' } = 



To prove (|5ip . we again use the expansion of bj — b~ given in (|16|) . i.e. 



bj-bj = hf 



Im'iXjOoMftf/WMXj) + ±m®(X]e ){(f»)/(fg)MXj] 

1 n 

+b j 5l{{ixv' - iJv)ln 2 }<>{Xj) + —2 Y^fri + 0{(nVlogn)" 3 / 4 /M 



i=l 

where Etfij = 0. If we denote by C(Xj) the determinstic(bias) term in bj — bj, it is easy to see 
that £V ■ C(Xj)Zij = o(n 2 h5$) . For the stochastic part, write 

Zy^y = Zij&j + ^ Zijifij (53) 

We focus on the second term, as the first term is relatively negligible. Let c = EZij = 0(h 3 + 
hSfi), whence the second term in (j53j) is {nh 2 )~ l 2^,(Tij + c T2j) 5 where 

= ^{(pijiZij - c) + Cf>ij(Zij - c)}, T 2j = ^{(f>ij + <pij). 

i<l i<l 

By the second statement in Lemma 6.1 in Xia(2007), replacing 6 there with ($ T ,Xj) T , we 
know that with probabiltity 1, Ty = 0{n log n(h 3 + Ml) 1 / 2 } uniformly in ■& and j. On the 
other hand, by law of the iterated logarithm for U-statistics in Korolyuk et al (Lemma 16. 8p . 
Y^j T2j = n 3 / 2 (/i log log?!) 1 / 2 a.s. Since c = 0(h 3 + M|), we have 

^2 Z]( Tl i + cT 2i) = i° {n2 lo § n ( /l3 + M i) V2 + ™ 3/2 logn)(/i 3 + M|)} = o(n 2 h8#) 
j 

Proof of (|46p can be done in exactly the same manner as (|5ip . ■ 
The proof of (|26[) consists of the following two Lemmas. 



Lemma 6.13 Let R* n2 {9) = £ X* p(y< - a, - 6^' Xy) - p(^) - <5^ <p(Yi - aj - bjX\^)b 3 X i3 
Then with probability 1, we have 

(^hal^iK^e) - ER* n2 {6)] = o(l). (54) 

uniformly in $. 
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Proof Define X ix = X t -x, p lx = (l,Xj x ) T , K ix = K{Xjj/h), (3{x) = [m(^a:) J m / (flJx)^] T 
and ip ni (x;t) = ip(Yi; $ x f3{x) + 1). For any a, (3 G TZ d+1 , let 



$ ni (x;a,/3) = p{Yf, /i ix (a + (3 + f3(x))} - p{Yf, p,l x ((3 + (3(x))}) - ip ni (x;0)p ix a 



4«(«+/S) 



= A ix . y {v9 ni (x;t) - v? ni (x;0)}(it 
and R ni (x;a,f3) = ® ni (x;a,f3) - E$ ni (x; a, /3). Apparently, 



A" 



/o(yi - aj - b^Xij) - p(Yij) - 5^ ip{Yi - aj - bjXMbjXij = $ ni (X i; a, (3) 



XT 



with a = [0, bjSgY an d P = [®j ~ a i-> W ~ ^j)^oV • Let [a x ,6 x ] = [171(6^ x), m' (9^ x)] and V be 
any compact subset of the support of X. For any M > and 1? G n , define 

Af& = Ca n „, M* 2 = C{\8#\ + ^/i/logn)" 1 / 2 }, 
M^ 3 = C{\5#\ + (ntylogn)" 1 /^}, 5 « = { a e = [ 0>a T ]T ) | aj | < 

BP = {[3 g A d+1 |/3 = [61, 6 2 ^] T , |6i| < M^ 2 , |6 2 | < M* 3 }. 

As < Ca ni9 , |aj— a,] = 0{\6#\+(nh/ 'log?!)" 1 / 2 } and = 0{\5. a \ + (nh/ log ny 1 / 2 /h}, 



541) will follow if for any e > 



sup sup Rni(x; a, (3)\ < ed n a.s., d n = nha 



2 

nil 



(55) 



/3 GB 



(2) 



This is done in a similar style as Lemma 4.2 in Kong et al(2008). Cover D by a finite number 
T n of cubes T>k = T> n k with side length l n = 0{h(nh/ log n) -1 / 4 } and centers = x n ,fc- Write 



sup I V"it! ni (x;a,/3)| < max sup V] Rni(x k ; a, /?) 



sup 

^.EBf, i=l 



l<*<T» aeB CD /i=1 



/3 GB. 



(2) 



/3 GB. 



(2) 



n 

+ max sup sup IS^ \$ ni (x k ;a, (3) - $ ni (x;a, (3)\ 



i<k<T n xev ka&B m t - i=1 

/? G B<?> 



7t 

+ max sup sup i E$ ni (x k ; a, /3) - E® ni (x; a, /3) I 



/J G B< 2 > 



=<2i + <2 2 + Q3- 
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In Lemma |6.15| we will prove that Q 2 = o(d n ), a.s., whence Q3 < EQ 2 = o{d n ). It remains to 
show that Qi < ed n /3 a.s., which can be done following a similar proof style as in Lemma 4.2 
in Kong et al (2008). 

(i) (i) (i) 

Partition B\ , i = 1, 2 into a sequence of sub rectangles D\ , • • • , Dj , i = 1, 2, such that 
for all 1 < jx < Ji < M d+1 (M = e" 1 ) and for alia, a' G D^, we have |a - a'| < M^/M; 
for all /? = [6i,6 2 ^] T ,/5' = [&i,6^o"] T G D^, we have \h - b[\ < M% 2 /M,\b 2 - b' 2 \ < M*JM. 
Choose a point otj x G and G Dj^ , 1 < ji,k\ < Ji. Then for any x, 

sup I V]i2 ni (a:;a,/9)| < max sup | ^{^(x; a^, b kl ) - Rni(x; a,(3)}\ 

a £ B„ 1 a £ £>j. , i=l 

n 

+ max \y^R ni (x;a jl ,(3 kl )\ = H nl + H n2 . (56) 
i<n,fci<Ji r - : 

We first show that any e > 

T„p{#„ 2 > ^} < T^jl^iU^i,/^)! > ^ } = 0(n~ a ), (57) 

for some a > 1. By Bernstein's Inequality and the fact that \R n i(x; aj 1 , /3fc 1 )| < Ca n $ and 
Var{i? ni (x;aj 1 ,/3 fcl )} = O^a^a™? + (n/i/ log n)~ 1/2 }], we have 

n , 

^nJiPjl ^^(xja^.^JI > ^} = T n J f exp[-e 2 n/ia n)? {l + a n #(nh/ log n) 1/2 )}] = 0(n" a ), 

i=l 

for some a > 1. Therefore, (157p holds. 



We next consider fl" n i. For each ji = 1, • • • , J\ and i = 1,2, partition each rectangle D -^ 
further into a sequence of subrectangles D-^,--- ,D-^j 2 . Repeat this process recursively as 
follows. Suppose after the 1th. round, we get a sequence of rectangles Dj^j 2 ... ^ with 1 < < 
Jki 1 < k < Z, then in the (Z + l)th round, each rectangle I?,- » ... ,■ is partitioned into a 
sequence of subrectangles {D^\ ... • ■ ,1 < ji < Ji} such that for all 1 < < Ji + \ and 
for all a, a' G D^. ... . . , we have |a - a'| < M^/M l+1 ; and for all /3 = [61, 6 2 6»X] T , = 

W G ' 6l - ^ ^ M n2/M l+1 , \h - b' 2 \ < M»jM l +\ where J m < M d+1 . 

Repeat this process after the (L n + 2)th round, with L n being the largest integer such that 

n(2/M) L " > d n /M% 2 . (58) 

(i) (i) 

Let D l ) , i = 1,2, denote the set of all subrectangles of Dq after the Ith round of partition 

(i) (i) (i) _ (1) 

and a typical element Dj^-^ ... ^ of D\ is denoted as D(J^. Choose a point G and 
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%) e Define 



y i = Yl p {\^{^( x ^ a Ui)'^))-^ x ' a Ui+i)'^+i))y\ - |tt}' 1 < ^ < L n + l, 

(ij+i) »=1 
(fcj+i) 

n ^ 
Ql = /2 P { SU P | ^{iJni(i;%-,),%)) - i?ni ;«,/?)} > }, 1 < / < L„ + 2. 



(*=!> 



Then Qi < Vi + Qz+i, 1 < <* < L n + 1. On the other hand, it is easy to see that for any 

n|i? ni (x;a 0Ln+2) ,/3 (fcin+2) ) - i? m (x;a,/3)| < nM% 2 /M L " +2 < ed n /2 L - +2 
due to the choice of L n specified in ([58]) . Therefore, Ql„+2 = and it remains to show that 

T n P{H nl > ^} < T n j\Q x < T n Jl Y; V i = °( n_a )> for some ° > 1- (59) 
To find upper bound for Vj, 1 < / < L n + 1> we again apply Bernstein's inequality. As 

\R ni (x; a {jl) , (3 {kl) ) - R ni (x; a (jl+l) , P (kl+l) )\ 
< C{\a {jl) - a (jl+l) \ + \P {kl) - + h)} = M» 2 /M\ 

E\R ni (x;a {n) ,p {kl) ) - R ni (x; a {n+l) , (3 (kl+l) )\ 2 < h(M* 2 ) 3 /M l , 

we have 

1+1 

V t < (J] J i) exp[-£ 2 nh{l + a n 4nh/ log n) 1 / 2 }], 
i=i 

and (|59p thus holds. This together with (|57|) completes the proof. ■ 
Lemma 6.14 Let Z^ = Kij[tp(Yi — aj — bjO^Xij) — (f(Yi — cij — bjO^Xij^bjXij . Then 

^ Zij - EZ i:j = o{n 2 ha n #). (60) 

Proof As cij — cij = 0(a n $), (bj — bj) = 0{a n $ + (nh/ log n) 1 / 2 /K} and for any e > 0, 
P|| ^2 Z,^ - EZij\ > en 2 ha n ^ < nP|| ^ - EZij\ > enha n ^ 
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then (|60p would follow if we could show that for any x, 

pj sup I Ri x (a,b)\ > enha n ^ = 0(n~ a ) for some a > 2, (61) 

a G B& i 

P e b£ 2) 

where -B^ = {a G i? : |a— a x \ < ca n $}, Pn = {b G P : |6— b x \ < c{a n $+{nh/ log n) 1 / 2 //),}}, a x = 
m(0Qx), b x = m'(0^x), R ix (a,b) = Z ix (a,b) - EZ ix (a,b), K ix = K(Xjj)/h) and Z ix (a,b) = 
K ix X ix [ip(Yi - a x - b x 9~lX ix ) - ip(Yi - a- b9^X ix )]. To this end, partition B%\ i = 1, 2 into a 
sequence of sub rectangles D± , • • • , Uj, , i = 1,2 such that 

|^.f| = sup [\a - a'\ : a, a' G D^} < M«/M, 1 < ji < Ji, 

where Mn = ca n ^, Mn = c{a n $ + (nh/ log n) 1 / 2 /h}, M = e _1 and J\ < M. Choose a point 
G L>£ } and b kl G £>£ } . Then 

n 

sup \y^R ix (a,b)\ < max sup | y^Ri^a^, b kl ) - Ri x (a, b)}\ 
beB& beD & 

n 

+ max | } R ix (a h ,b kl )\ = H nl + H n2 . (62) 
i<ii,fci<Ji r - : 

We first consider i? n 2- 

enha n $ \ 2 r enha n ^ "i 



^ #n2 > 



i=l 

As Ri X (a,j 1 , b kl ) is bounded and Yar{Ri x (a,j 1 , b kl )} = 0{h(a n $ + (nh/ log n) -1 ' 2 }, then by Bern- 
stein's inequality we have 

J 2 p{| AJI > < CJ^expl-e 2 ^/ 2 ^} = 0(n - ); 

i=i 

for some a > 2. 

We next consider For each j\ = 1, • ■ ■ , Ji and i = 1,2, partition each rectangle £?® further 

(i) (?) 

into a sequence of subrectangles D^^-- - ,Dj i J2 . Repeat this process recursively as follows. 

(i) 

Suppose after the Zth round, we get a sequence of rectangles D) ■ ■ with 1 < j k < J k , 1 < 
k < I, then in the (/ + l)th round, each rectangle j 2 ... is partitioned into a sequence of 
subrectangles l-D^!^ „• „• , 1 < ji < Ji} such that 

= su p{i« - «i : e /; .;; ,,,.,} < m«/m<+\ i < Jl+1 < 
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where J;+i < M. End this process after the (L n + 2)th round, with L n being the smallest integer 
such that 

(2/M) L " > a n #/M%> [which means 2 L " < {M® /a n ^ W2)/ log 2 ] (63) 

(i) (i) 

Let Dj | , i = 1,2, denote the set of all subrectangles of Dq after the Zth round of partition 

(i) (i) (i) _ (1) 

and a typical element D -^ - 2 ... ^ of D z is denoted as D^y Choose a point ay^ G ^fjY) anc ^ 

(2) 

b(j^ G -^q/) an d define 

V * = E P {| £{^*K Aj - > f! ^£r}> i < i < w + 1, 

Ui) i=l 

Q* = ^P{ sup ^{i2 ix (a il ,6 fcj )-12 ix (a J 6)}| > ^T 1 }, 1 < Z < L n + 2. 

Ui) aeD uo> i=l 

Then < + Q I+1 , 1 < Z < L n + 1. We first give a bound for V h 1 < I < L n + 1. As 
Rix(aj n b kl ) - Rix(aj l+1 ,b kl+1 ) is bounded and 

£|^, x .(a j; A ; ) - i^(a j;+1 A ;+1 )| 2 < h{a n # + (nhj log n)-^ 2 }/M l+l , 



applying Bernstein's inequality and using (|63|) . we have 

l+i l+i 
Vi < (II J i) exphe^minla^^^^nVlogn) 1 / 2 }] < ( ]J jfj exp(-eV/ 2 /i 3 / 2 ). (64) 
j=i i=i 

We now focus on Qi n +2- Recall the definition of Zi X (a, b) 

Zix(a, b) = K ix [ip(Yi - a x - b x 9^X ix ) - (p(Yi - a- bO^ X ix )]X ix . 

For any a G ^ua an ^ b G let If' b = 1, if there is a discontinuity point of ip(.) between 

Yi — dj l — b^OjjXix and Yi — a — b9^Xi x and I?' = otherwise. Write 

Rix(a>jt,hi) ~ Rix(a,b) = {Rixidj^bku) - R ix (a,b)}I° ,,b + {i?«(a J; , b k[ ) - R ix (a,b)}(l - if' 6 ). 

Then we have \{Ri x (aj v b kl ) — Ri x (a, 6)}(1— lf' b )\ < C{a n $ + (nh/ log n)~ 1 / 2 }/M z and specifically 
for Z = L n + 2 

P{ sup |^{^( %i ,6 fci )-^(a,6)}(l-lf) >^f} 



feoff, 



i=l i=l 
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where U{ = I{\Xj x $\ < h} and the first inequality is due to ([63]) . By Bernstein's inequality, this 
in turn implies that for I = L n + 2 

U J !) P { SU P |E{^(«i«^)-^(a.6)}(l-4 ,,6 )|>^^} = 0(n- tt ), (65) 
i=i ae£, u?)' i=1 " 



for some a > 2. Now we have to show similar result for 
i+i 



U J j) P { SU P lE^^P^-^'W'l^ 5 ^}' l = Ln + 2 

3=1 aeD u]y i=l 2 



b 6 D 



Note that for any a G Z)|H and 6 G Z>g> < /{Fj G Si}, where 

$ = [a i; + b h elX ix - CM&>/M l , ajl + b h elX ix + CM^/M 1 }, 

which is independent of a, b. Let Ui = I{\Xj x $\ < h}I{Yi G Si}. As Ri x (aj l ,bk l ) — Ri x {a,b) is 
bounded, we have for I = L n + 2, 

a,b ^ enha n $- 



P| sup ^{i? ix (a jn 6 fc J - -R ix .(a, b)}l 

a6-D, (1) ,, ,' — 1 
Ui) ' — 1 



> 



2Ln+3 



b e D 



n 



(66) 



< p{ y Uj > en T n * )<p{yui-EUj> e T n * I, 

where the second inequality is due to (|63j) . Applying Bernstein's inequality to the right hand 
side of (f66l) and by (j63j) . we have 
l+i 



\J])P{ sup ^{^(a^^fcJ-^^^Kf* 6 



> 



enha. 



2Ln+3 



| = 0( n - a ), for/ = L n + 2 



for some a > 2. This together with (|65p implies that Ql„+2 = 0(n~ a ) for some a > 2. Therefore, 
based on (JTOj) . we have 



enha n $ 



}<Qi< E ^ + 0^+2 = 0(^" a ) 



for some a > 2. 



Lemma 6.15 For all large enough M > 0, Q2 < Md n a.s., where 

al n = nha 2 n $l n /h{\ + a~#(nh/ log n)~ 1/2 } = o(nha 2 n $), 
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Proof Let X ik = X { - x k , Hik = (l,Xj k ) T , K ik = K(X] k $/h) and write $ ni (x k ;a, (3) - 
$ ni (x; a, [3) = &i + ii 2 + Ci3, where 

Cil = (KikVik ~ K ix fi ix ^ a Jp 1 {ipnifak; f^ikiP + act)) - (f n i(xk; 0)} dt, 

&3 = ^^^{^(x; 0) - (pni(xk; 0)}. 
Then P(Q 2 > Af 3 / 2 d n /3) < T n (P nl + P n2 + P n3 ), where 

n 

P nj = max p( sup sup | V > M 3 / 2 d n /9) , j = 1, 2, 3. 

Based on Borel-Cantelli lemma, Q 2 < M 3 / 2 d n almost surely, if ^ n T n P nj - < 00, j = 1,2,3. 
Again this can be accomplished through similar approach in Lemma 5.1 in Kong et al(2008). 
We only deal with P n j to illustrate. 

First note that if £ji 7^ 0,then either K^ k / or Ki x 7^ 0. Without loss of generality, suppose 
K ik + 0, i.e. \Xjj\ < h, whence \Xjj \ <h+ \5#\ and \fi T lk ((3 + at)\ < C{M$ + M%j}. 

For any fixed a G -E>i^ and /? G -Bn 2 \ let Iff = 1. If there exists some i G [0, 1], such that 
there are discontinuity points of tp(Yi — a) between lji[ k {f3(x k ) + (3 + at)) and fj^ k (3 p (x k ); and 
If/ = 0, otherwise. Write fa = falff + fa(l - Iff). As \(K ikf x ik - K ixf i lx ) T a\ < CM^lJh 

T (2) 

and \fi ik ((3 + at)\ < CM^J, we have 

\fa(l - lff)\ < CM^M^lJh = o(a^) 

uniformly in i,a, (3 and x G if nh 3 / log n 3 — > 00. Let [7^ = < 2h}. As £a = faU{ k 

(because l n = o(h)), we have 



p( sup su P |tui-^)|>^) < Kt^>w) 



< 



P(lE^-^l>||?), (67) 

i=l 

where the second inequality follows from the fact that EU{ k = 0(h). We can then apply to (|67p 
Bernstein's inequality for independent data or Lemma 5.4 in Kong et al (2008) for dependent 
case, to obtain the below result 

n 

T n P^ sup J fa(l — Iff) > Md n /18\ is summable over n, (68) 

a &b£\ i=l 
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whence ^2 n T n P n i < oo, is equivalent to 



T n p( sup > Md„/18j is summable over n. (69) 



a 6 B£\ ' i=l 
/3 6 B« 2) 



To this end, first note that I^f < I{ei G Sp k }, where 

m 

S i-k = U U [aj - MXi,x k ) + ^ k {(3 + at), aj - A{Xi,x k )} 
j=ite[o,i] 

m 

C |J [a,- -CM$, aj + CM®] =D n , for some C > 0, 

A(»i,»2) = m(x[0o) - m(x2^o) - m'(a;i0o)(^l _ £2) T #o, 

where in the derivation of S?L C D n , we have used the fact that \Xi k \ < 2h, fjJ ik (P + at) = 
0{M®) and A(Xi,x k ) = 0{h 2 + \5#\ 2 ) = o(M®) uniformly in i. As If/ < Ife G L> n }, we 
have ICiil-^-fc < ICiil^ni, where U n i = I(\X ik \ < 2h)I{e,i G -D n } 5 which is independent of the 
choice of a and (5. Therefore, 

n n 

P( sup | foi^l > Md n /is) <P(J2 U ™ > MnhM^/(18C)) 
aeB^, i=i i=i 

n (2) 

<p(Y,(U ni -EU ni) >^^-), (70) 

i=l 

where the first inequality is because |£ji| < CMa n $l n /h and the second one because EU n i = 
0(hMn). Similar to (|67p . we could apply either Bernstein's inequality for independent data or 
in dependent case Lemma 5.4 in Kong et al (2008) to see that (|69p indeed holds. ■ 

Lemma 6.16 All eigenvalues of (S<2 + 9q9q)~ 1 (Qo + ^o^o) fall into the interval (0, 1). 

Proof By the Cauchy-Schwarz Inequality that for any x G R d , 

E{g(X)(X - x)\X T $ = x T $}E{g{X)(X - x)|X T t? = x T $} T 
< E{g(X)\X T $ = x T $}E{g(X)(X - x){X - xf\X T {) = x T $}, 

which is equivalent to 

{v0(x) - xn#(x)}{v$(x) - xfi^(x)} T < [i$(x)uj$(x) 
or n$(x){(v / n)# (x) - x}{{v/n)&(x) - x} T < u#(x). 
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Multiply both sides by m'^) 2 and take expectation, we have that S 2 — ^0 > 0, which could be 
strengthen as S 2 — > 0. This is because if there exists some $1 7^ 0, such that $[(S 2 — f2o)i?i = 
0, then for any aj, there exists some C, such that 

{g(X)} l ' 2 4[{X -x) = Cig(X)} 1 / 2 , for all X T t? = x T tf 

-x) = C, for all X T i? = x T V => tfi = 1? (71) 

A sufficient condition for (S-j + #o$o) _1 (^o + ^o^o ) to have only positive eigenvalues is that 
#0 is the sole eigenvector of S 2 and f2o that corresponds to eigenvalue 0. We argue this by 
contradiction. Suppose there exists some $ such that #_L#o and 

E{g(X)$ T (X - x)(X - xftiftlX = e^x} = 0, for any x E R d (72) 
E{g(X)-d T {X - x)\9 T X = Six] = 0, for any x £ R d (73) 

Note that as g(X) > 0, (JTSJ) in fact implies that E{f {X - x)\6~lX = O^x} = 0, which in turn 
means that $ = #0; this contradicts the fact that $_L#o- 

To show that (|73j) can't be true, let {&!,••• , constitute the orthogonal basis of the 
orthogonal space to vector 6q. Let x = bi, i = 1, - ■ ■ ,d— 1, then #q x = and from (j73[) we have 

£{<?(X)tf T (X - = 0} = 0,^$ T E{g(X)X\9 T X = 0} = tf T fe 4 £{<?(X)|^X = 0} 

As E{g(X)X\0QX = 0} and E{g(X)\0Q X = 0} are constants (vector) independent of b% and 
^{g(X)X|0QX = 0}_L^0i we have that there exists some vector 6_L6*o such that 

■& v b = d y b h i = l,--- ,d- tf T (6- bi) = i = 1, - ■ ■ ,d — 1, 

but this can not be true unless $_L&i for all i = 1, • • • , d — 1. 

Next we show that A max < 1 by contradiction. If not, suppose x is the corresponding 
eigenvector, 

(s 2 + Mj) _1 (fio + Mo)* = A m a.x =► (fi„ + e Q 9l)x = x max (s 2 + e ei)x 

=>■ X T (Q + 6q6q)x = X max X T (S 2 + 9 9q)x => X T Q X > X max X T S 2 x(-: XmaxX > 1) 

which contradicts the fact that S2 — > if x / #o- 
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